Large-Scale Bayesian Logistic Regression for Text Categorization
نویسندگان
چکیده
Logistic regression analysis of high-dimensional data, such as natural language text, poses computational and statistical challenges. Maximum likelihood estimation often fails in these applications. We present a simple Bayesian logistic regression approach that uses a Laplace prior to avoid overfitting and produces sparse predictive models for text data. We apply this approach to a range of document classification problems and show that it produces compact predictive models at least as effective as those produced by support vector machine classifiers or ridge logistic regression combined with feature selection. We describe our model fitting algorithm, our open source implementations (BBR and BMR), and experimental results.
منابع مشابه
Sparse Logistic Regression for Text Categorization
This paper studies regularized logistic regression and its application to text categorization. In particular we examine a Bayesian approach, lasso logistic regression, that simultaneously selects variables and provides regularization. We present an efficient training algorithm for this approach, and show that the resulting classifiers are both compact and have state-of-the-art effectiveness on ...
متن کاملBayesian Text Categorization
Natural language processing is an interdisciplinary field of research which studies the problems and possibilities of automated generation and understanding of natural human languages. Text categorization is a central subfield of natural language processing. Automatically assigning categories to digital texts has a wide range of applications in today’s information society—from filtering spam to...
متن کاملA sparse version of the ridge logistic regression for large-scale text categorization
The ridge logistic regression has successfully been used in text categorization problems and it has been shown to reach the same performance as the Support Vector Machine but with the main advantage of computing a probability value rather than a score. However, the dense solution of the ridge makes its use unpractical for large scale categorization. On the other side, LASSO regularization is ab...
متن کاملA new term-weighting scheme for naïve Bayes text categorization
Purpose – Automatic text categorization has applications in several domains, for example e-mail spam detection, sexual content filtering, directory maintenance, and focused crawling, among others. Most information retrieval systems contain several components which use text categorization methods. One of the first text categorization methods was designed using a naı̈ve Bayes representation of the...
متن کاملA flexible Bayesian generalized linear model for dichotomous response data with an application to text categorization
Abstract: We present a class of sparse generalized linear models that include probit and logistic regression as special cases and offer some extra flexibility. We provide an EM algorithm for learning the parameters of these models from data. We apply our method in text classification and in simulated data and show that our method outperforms the logistic and probit models and also the elastic n...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Technometrics
دوره 49 شماره
صفحات -
تاریخ انتشار 2007